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Abstract 

We consider the scenario in which multiple sensors send spatially correlated data to a fusion center 
(FC) via independent Rayleigh-fading channels with additive noise. Assuming that the sensor data is sparse 

>; 

q , in some basis, we show that the recovery of this sparse signal can be formulated as a compressive sensing 

■ (CS) problem. To model the scenario in which the sensors operate with intermittently available energy 

that is harvested from the environment, we propose that each sensor transmits independently with some 
probability, and adapts the transmit power to its harvested energy. Due to the probabilistic transmissions, 
the elements of the equivalent sensing matrix are not Gaussian. Besides, since the sensors have different 
Q . energy harvesting rates and different sensor-to-FC distances, the FC has different receive signal-to-noise 

ratios (SNRs) for each sensor. This is referred to as the inhomogeneity of SNRs. Thus, the elements of the 
J> 1 sensing matrix are also not identically distributed. For this unconventional setting, we provide theoretical 

, guarantees on the number of measurements for reliable and computationally efficient recovery, by showing 

that the sensing matrix satisfies the restricted isometry property (RIP), under reasonable conditions. We then 
compute an achievable system delay under an allowable mean-squared-error (MSE). Furthermore, using 
techniques from large deviations theory, we analyze the impact of inhomogeneity of SNRs on the so-called 
fc-restricted eigenvalues, which governs the number of measurements required for the RIP to hold. We 
conclude that the number of measurements required for the RIP is not sensitive to the inhomogeneity of 
SNRs, when the number of sensors n is large and the sparsity of the sensor data (signal) k grows slower 
than the square root of n. Our analysis is corroborated by extensive numerical results. 

Index Terms 

Wireless compressive sensing, Energy harvesting, Restricted isometry property, Compressive sensing, 
Wireless sensor networks, Rayleigh-fading channels, Large deviations 

G. Yang, S. H. Ting and Y. L. Guan are with the School of Electrical and Electronic Engineering, Nanyang Technological 
University, Singapore (e-mail:yang0305@e.ntu.edu.sg; {shting, eylguan}@ntu.edu.sg). G. Yang is supported in part by the Advanced 
Communications Research Program DSOCL06271, a research grant from the Directorate of Research and Technology (DRTech), 
Ministry of Defence, Singapore. 

V. Y. F. Tan and C. K. Ho are with the Institute for Infocomm Research, A*STAR, Singapore (e-mail: {tanyfv, hock}@i2r.a- 
star.edu.sg). V. Y. F. Tan is also with the Department of Electrical and Computer Engineering, National University of Singapore. 



(N 



X 



November 7, 2012 



DRAFT 



2 



I. Introduction 

The lifetimes of conventional wireless sensor networks (WSNs) are limited by the total energy 
available in the batteries. It is inconvenient to replace batteries periodically, or even impossible 
when sensors are deployed in harsh conditions, e.g., in toxic environments or inside human bodies. 
Energy harvesting of ambient energy such as solar, wind, thermal and piezoelectric energy, appears 
as a promising alternative to a fixed-energy battery, to prolong the lifetime and offer potentially 
maintenance-free operation for WSNs [QQ|, ED- Compared to limited but reliable power supply from 
conventional batteries, energy harvesters provide a virtually perpetual but unreliable energy source. 
Moreover, the sensors typically have different energy harvesting rates, due to varying harvesting 
conditions such as the spread of sunlight and difference in wind speeds. 

This paper addresses the problem of data transmission in energy harvesting WSNs (EH- WSNs). 
We assume that energy harvesting sensors are deployed to monitor some physical phenomenon 
in space, e.g., temperature, toxicity of gas. Data collected from sensors are sent to the fusion 
center (FC). The data are typically correlated, and well approximated by a sparse vector in an 
appropriate transform (e.g., the Fourier transform). Recent developments in compressive sensing 
(CS) theory provide efficient methods to recover sparse signals from limited measurements 0. CS 
theory states that if the sensing matrix satisfies the restricted isometry property (RIP), a small number 
of measurements (relative to the length of the data vector) is sufficient to accurately recover the 
sparse data. This advantage of CS potentially allows us to reduce the total number of transmissions, 
and this is particularly important for data transmission in bandwidth-limited wireless channels. 

The accurate estimation of the sensor data by the FC has recently been addressed by using 
CS techniques in the literature. In [4], Haupt et. al presented a sensing scheme based on phase- 
coherent transmissions for all sensors. However, [H made two practically limiting assumptions. 
First, it assumed that there was no channel fading, and path losses for all sensors were identical. 
Second, the transmissions from all sensors were synchronized such that signals arrived in phase 
at the FC. In 0, Aeron et. al derived information theoretic bounds on sensing capacity of sensor 
networks under a fixed signal-to-noise ratio (SNR) for all sensors. In contrast, [0 proposed a sparse 
approximation method in non-fading channels, which adapted a sensor's sensing activity according 
to its energy availability. In 0, Xue et. al successively applied CS in the spatial domain and the 
time domain, under a fixed SNR for all sensors. In jH, Fazel et. al proposed a random access 
scheme in underwater sensor networks. Each activated sensor picked a uniformly-distributed delay 
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to transmit. By simply discarding the colliding data packets from concurrent medium access, the 
FC used a CS decoder to recover the sensor data based only on the successfully received packets. 
Thus, the scheme did not exploit packet collisions for data recovery. 

Since sensors are placed at different locations, it is commonly assumed that the sensors transmit 
data over independent but nonidentical channels with different fading conditions. Different energy 
harvesting rates also lead to different transmit powers and hence different (receive) SNRs. We refer 
to this generally as the inhomogeneity of SNRs. The application of wireless compressive sensing to 
the scenario of inhomogeneous SNRs has, to the best of our knowledge, not been studied in the 
literature. We define the system delay as the number of concurrent sensor-to-FC transmissions (or 
channel uses) for estimating one data vector (among sensors). We aim to reduce the system delay, 
while ensuring a target estimation accuracy. Surprisingly, we observe that the required number of 
measurements for accurate recovery m is not overly sensitive to the inhomogeneity of SNRs provided 
that the number of sensors n is large and the sparsity of the data vector k grows slower than y/n. 
This motivates us to further investigate the impact of inhomogeneity of SNRs, based on the recovery 
performance in terms of RIR 

The three main contributions are summarized as follows. 

1) We first present an efficient transmission scheme, which features probabilistic transmission by 
sensor nodes. In each time slot, every sensor locally decides to transmit with some probability, 
and adjusts the transmit power according to its energy availability. The FC thus receives a linear 
combination of signals that are transmitted from a random subset of sensors. The transmissions 
over successive time slots result in a sensing matrix which is effectively achieved through the 
mixing of signals in wireless channels. 

2) Second, we prove that the FC can recover the data accurately, if the total number of trans- 
missions (or measurements) m exceeds 



where n is the number of sensors, k is the sparsity of the sensor data, and p max (fc) and p m i n (fc) 
are respectively the maximum and minimum A; -restricted eigenvalues (see definition in (fl"5l) ) of 
a Gram matrix which depend on the inhomogeneity of SNRs. Different from previous works 
on CS, our bound depends explicitly on the ratio p max {k)/ Pmm{k), which is the A; -restricted 
condition number of the Gram matrix. Based on this result, we also compute the achievable 
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system delay subject to a desired recovery accuracy. 
3) Third, we analyze the impact of inhomogeneity of SNRs on the required number of mea- 
surements, in terms of p max (fc) and p m - m (k). We model the signal powers of the sensors as 
independent truncated Gaussians. By using the theory of large deviations, we show that both 
Pmax(^) and p m m(fc) concentrate around one (for all constant k) in large n regime, and the 
rate of convergence to one depends on the inhomogeneity of SNRs. This allows us to explain 
the observation that the inhomogeneity of SNRs does not significantly affect the number of 
measurements required for the RIP to hold. 
This remainder of this paper is organized as follows: Section [H] provides a description of the 
system model. Section [III] presents a new wireless compressive sensing scheme. Section [IV] details 
the main results on the RIR the achievable system delay and investigates the impact of inhomogeneity 
of SNRs. Section |V] provides the simulation results. Section |VT] concludes this paper. The proofs 
for the RIP result and the result on the impact of inhomogeneity of SNRs are given in Section IVIII 
We adopt the following set of notation in this paper: lower case letters denotes deterministic 
scalars, and lower case Greek letters for constants or angles. Boldface upper case and boldface 
lower case refer to matrices and (column) vectors, respectively. We use upper case letters to denote 
random variables. Sets are denoted with calligraphic font (e.g., V). The cardinality of a finite set V 
is denoted as |V|. The n-order identity matrix is denoted by I„. We also use W l and C n to denote 
the n-dimensional real and complex Euclidean spaces respectively. 



Consider a wireless sensor network that consists of n energy harvesting sensor nodes and a FC. 
Sensors transmit their data to the FC via a shared multiple-access channel (MAC). We consider 
slotted transmissions by first considering a single snapshot of the spatial-temporal field. Assuming 
the sensor data s is compressible, we can model it as being sparse with respect to (w.r.t.) to a fixed 
orthonormal basis e C n : j — 1, . . . , n}, i.e., 



where x G C" has at most k < [n/2\ non-zero components and |_-J is the floor operation. 

We assume a flat-fading channel with complex- valued channel coefficients hij, where 1 < i < m 
denotes the slot index and 1 < j < n denotes the sensor index. The channel remains constant in each 
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Fig. 1. The MAC communication structure for WSNs in the i-th time slot. The signals that are concurrently transmitted from 
sensors to the FC are linearly combined over the air. 



slot. We further assume a Rayleigh-fading channel, hence the channel coefficients for different slots 
are independent and identically distributed (i.i.d.) according to the complex Gaussian distribution. 

We propose that sensors concurrently transmit to the FC in a probabilistic manner, such that the 
signals from sensors are linearly combined over the air. Sensor j multiplies its datum Sj by some 
random amplitude 0^ (to be defined in ©), then transmits in the 2-th time slot. The FC thus receives 

n 

i=i 

where is a noise term (not necessarily Gaussian). After m time slots, the FC receives the 
measurement vector 

y=(H0$)s + e = Zs + e = Z*x + e, (3) 

where the matrix Z = H0$, and the operation is the element- wise product of two matrices. We 
assume all noise components are independent, zero mean and have variance a 2 . The signal model 
over one slot is illustrated in Fig. \T\ 

From the perspective of signal recovery, we want to estimate x or equivalently s, from y, such 
that the mean-squared-error (MSE) E||x — x||| does not exceed some threshold e. Also, we would 
like to estimate the sparse vector using minimum network resources (i.e., channel uses), due to 
limited channel resources. Thus, given a fixed number of sensors n and an e, our objective is to 
design a transmission scheme that minimizes the number of sensor-to-FC transmissions m. 
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Different from (61, O, we consider Rayleigh-fading channels, and adopt concurrent transmissions 
in a probabilistic manner. Moreover, the SNRs of different sensors are considered to be different, 
compared to the fixed SNR case in the literature Q, Q, O. 

III. Energy-Aware Wireless Compressive Sensing 

In Section IIII-Al we first provide a CS perspective for the signal model in ©. Then in Section 
IIII-Bl we present an energy-aware wireless compressive sensing scheme. By taking into account the 
inhomogeneity of SNRs, we also derive the probability distribution function (pdf) of elements in 
the random matrix Z in Section IIII-CL which will be used to show the RIP in Section IIV-AI 

A. A Compressive Sensing Perspective 

Since we assume the data vector x is sparse in some basis, it seems natural to adopt a CS method 
to recover x. The over-the-air combination via the channel matrix H contributes to the effective 
equivalent sensing matrix Z in ([3]). However, there are two differences from the conventional CS 
setup that make the analysis more challenging. 

• Due to probabilistic transmissions, the elements of the sensing matrix Z are not Gaussian. 

• Since sensors have different energy harvesting rates and different sensor-to-FC distances, the 
FC has different receive SNRs for all sensors. Thus, the elements of the sensing matrix Z are 
also not identically distributed. 

The proposed transmission scheme calls for the analysis of non-Gaussian non-i.i.d. sensing matrices. 
Hence, we need to analyze the system performance in a more intricate way that differs from 
conventional CS problems. The key technique we employ is to show that the elements of the 
sensing matrix Z are sub-Gaussian, and make use of new results on sub-Gaussian random matrices. 

B. Energy-Aware Wireless Compressive Sensing 

We consider only the energy consumption for wireless transmissions, by assuming the energy 
consumption on sensing is negligible. The energy harvesting rate varies over sensors. For simplicity, 
we assume that each sensor allocates the same power for all slots. Let Ej be the accumulated 
harvested energy that is available for sensor j to transmit in each slot. We perform energy-aware 
wireless transmissions taking into account the different available energy. It is noted that a causal 
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energy constraint that comes from energy harvesting should be satisfied, i.e., energy that is consumed 
for transmissions can not exceed the energy available in each slot. 

Set a probability p G (0,1] and a squared-amplitude bj > . Let $ in © be a selection-and-weight 
(SW) matrix, whose elements are independently generated according to the random variable 



bj w.p. p/2 

w.p. l-p, Vz = 1,2, . . . ,ra. (4) 
bj w.p. p/2 



That is, the sensor j transmits with probability p with an amplitude of y/bj, and the actual value is 
positive or negative with equal probability. Given available energy Ej, we choose bj such tha{] 

pbj<Ej, Vj = l,2,...,n. (5) 

Clearly, each entry 0y is zero mean and has variance pbj. The causal energy constraint is satisfied 
in expectation, i.e., E(</>f-) = pbj < Ej. This allows us to save energy to be used for future 
transmissions. The energy-saving feature can be crucial in the scenario where the energy harvesting 
rates are fluctuating over several snapshots of the spatial-temporal field. It is, however, beyond the 
scope of this paper to optimize for the bj's. 

In H, all sensors consume the same amount of energy for transmissions. In contrast, each 
sensor here adapts the transmit power to its available energy via the above-designed SW matrix. 
Furthermore, the SW matrix randomly selects the sensors to transmit, and weighs the data according 
to the sensors' harvested energy. In each time slot, a subset of sensors are selected at random to 
perform transmissions and over-the-air combination. The selections are performed in a distributed 
manner at each sensor node, since each node separately decides the slots that it transmits in. We 
couple random sensor selection and energy-aware transmission by the choice of the SW matrix. 

Recall the signal model in ©, i.e., y = Z^x + e. With the knowledge^ of the matrix Z and the 
sparsity-inducing basis \I>, the FC can implement CS decoding to recover sparse coefficients x and 
obtain the estimated data vector Is = \I>x. 

'The quantity bj can be written more generally as bij, which means the transmit powers for different slots are different. To reduce 
the complexity of processing, we allocate the same power to all the slots. 

2 The assumption that the FC knows Z and "J/ is reasonable, because the FC can perform channel estimation from preambles, and 
obtain the information on the amount of harvested energy via feedback. The channel and energy information is used for generating 
SW matrix from a predefined set of SW matrices. The global parameters like m and p can be broadcasted to all sensors. 
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C. Probability Distribution Analysis and Equivalent Normalized Signal Model 

Consider the signal model in ©. Denote each element in Z as Z t j = hijfyj = Zf- + jZj-, where 
Zfj = hfj(pij, and Zjj = h\j(j)ij. Note that elements of the matrix H are assumed to be independent, 
and each element has independent real and imaginary components. Also the matrix <fr consists 
of independent elements. All elements of matrix Z are thus independent, and have independent real 
and imaginary components. As such, it suffices to analyze the probability distribution of the real 
component, since the analysis is similar for the imaginary component. The marginal pdf of Zf- can 
be shown to be 

/ z „( 2 ) = J=/ H „ {jL\ . | + J_ /h? .f +(!-„). , W , (6, 

where /#r(-) is the pdf of channel coefficient of sensor j, and 5(-) is the Dirac delta function. For 
the sake of brevity, we define a new pdf as follows. 

Definition 1. A random variable X follows a mixed Gaussian distribution, denoted as X ~ 
N{n, v 2 -.V), if its pdf has the following form 

Sx(x) = p vh exp {- { ^f) + V)S{X) ' (7) 

where p G (0, 1] is the mixing parameter. The corresponding complex mixed Gaussian distribution, 
assuming the reed and imaginary components are independent, is denoted as Af c (fi, v 1 \p). 

Assuming Rayleigh-fading channels, all elements in the channel matrix H are independent, zero 
mean and follow Gaussian distributions. Note that due to different fading channels for the sensors, 
the matrix H has column-dependent variances, where the j-th column follows a Gaussian distribution 
with variances z/J. From © and ©, the marginal pdf of Zf, can be expressed as 



1 f z 



>2 



f z n(z) = P^= exp + (1 - p)S(z). (8) 

Thus, we have Z R ~ Af(o, uft j /2,p\ 

Recall that Z = H $. Let H = HT H and $ = ^r^, where T H = diag{^i, is 2 , ■ ■ ■ , Vn] and 
r$ = diag{A/p6i", \fpb~2-! •••■> VpK}- Then we can decompose the matrix Z as follows 



mZr, (9) 
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where we denote Z = H $ and T = r H r$. Let T = diag{ A /7T, A /7^, . . . , y/j^}, where the 
receive signal power of sensor j ij^| 7^ = pbjUj. We term the diagonal elements of T a signal power 
pattern. The 7/s are generally all different (i.e., inhomogeneous signal powers), and this directly 
leads to the inhomogeneous (receive) SNRs. We note that all elements of the matrix Z are i.i.d. 
mixed Gaussian random variables, i.e., Z ~ Kf c (0, l/(pm),p) and Z K ~ A/"(0, l/(2pm),p). 
Using the equivalent expression in (J9]), we rewrite the signal model in © as 

y = v/mZT*x + e, (10) 

where the matrix ^ is a unitary matrix. The distinct signal powers in T are spread along sparsity- 
inducing basis vectors (i.e., columns of ^). 

A matrix (or more correctly, a sequence of matrices) is said to be standard column regular if all 
elements are uniformly bounded by some constant [10]. For analytical convenience, we normalize 
the matrix to be standard column regular. The normalization constant is Hr^H^/y'n = y/P^c, 
where P avc = J^^iPfyv] ' /n denotes the average (receive) signal power in one time slot. Then the 
normalized matrix 

£ = r*/v^ (ii) 

has bounded spectral norm. By dividing both sides of (flOl) by y/mP ave , we obtain the normalized 
signal model 

y = Z£x + e = Ax + e, (12) 

where all noise components are independent, zero mean and have normalized variance a 2 = 
cr 2 /(mP ave ). The average SNR is defined as 

p n 

SNR ave ^ % = ^ (13) 
cr 2 na 2 ' J 

IV. Main Results 

Having derived the probability distribution of elements of the matrix Z in Section IIII-Cl we 
recall the definition of RIP [fTTI and state our main result, that is Theorem [T] in Section IIV-AI 
The engineering implication of Theorem [TJ and in particular the tradeoff between the achievable 

3 The receive signal power depends on both the channel condition (i.e., the variance of fading coefficients Vj and the average 
transmit power pbj) that is governed by the accumulated harvested energy. 
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system delay and the allowable MSE, will be discussed in HV-Bl Finally we analyze the effect of 
inhomogeneity of SNRs on RIP and the required number of measurements in Section IIV-CI 

A. Restricted Isometry Property 

It is well established in CS theory that a sufficient condition for accurate and efficient recovery 
(via convex optimization) is that the sensing matrix satisfies the RIP. A matrix A is said to satisfy 
RIP of order k, if there exists a 5 k G (0, 1) such that 

(l-**)||x||;< ||Ax||*<(l + <y fc )|rf (14) 

holds for all fc-sparse vectors x. The smallest constant 5k satisfying (fl4l) is known as the restricted 
isometry constant (RIC) [fTT|] . When the sensing matrix A is random, the inequality should hold 
with overwhelming probability that approaches one as n grows. Many families of random matrices, 
e.g., i.i.d. Gaussian random matrices and Bernoulli random matrices are known to satisfy the RIP 
ifTTfl. |[T2l|. As a result, to evaluate the recovery performance, all we have to show is that the sensing 
matrix A in our scheme also obeys RIP with overwhelming probability. 

The RIP requires that the sensing matrix A preserves the Euclidean norm of sparse vectors 
well. For the signal model in (fl2l) . the entries in Z are i.i.d. sub-Gaussian random variables (See 
Definition |6] in Section IVlI-AI) . It is known that random matrices (with sufficiently many rows and) 
with i.i.d. sub-Gaussian entries approximately preserve the Euclidean norm of sparse vectors with 
high probability lfT3l . Since A = ZS, we need to analyze the norm-preserving property of S. To 
do so, we define the k-restricted extreme eigenvalues of the Gram matrix £*£ as 

Pmax(fc) = max ||Sv||2, 

v:||v||o<fc,||v|| 2 =l 

Pmm(k) = min || Sv|||, 

v:|[v[|o<As,||v[|2=l 

where v G C n , and the "/ -norm" || v|| refers to the number of non-zero elements of v. The extreme 
eigenvalues will be used to understand how the inhomogeneous SNRs affects the RIP. 

Lemma 1. The following bounds on p max (fc) and p m i n (k) hold: 

1 < PmaxO) <k, < p min (k) < 1. (16) 

Proof: Fix a vector v G C n such that ||v|| 2 = 1 and ||v|| = k. Let T C {!,..., n} with 
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\T\ < k be the support of v. Let £7- G C nx ' r ' be the submatrix of £ with column indices T. 
Denote the eigenvalues of the Gram matrix S^-£r by Ai > . . . > A& > 0. Due to the normalization 
in (fTTI) . the trace of £^£7- is ^=1 <\? = k- This implies that the largest eigenvalue is at least one 
and at most k. Similarly, the smallest eigenvalue is no larger than one. ■ 
We note that the sparsity level k is usually much smaller than the number of sensors n in 
large-scale WSNs. We further assume p max (&) G [1, 2] in the following. This simplifies some of the 
mathematical arguments. We analytically and numerically verify this claim in Section ITV-C1 To state 
our main theoretical result cleanly, we define two quantities that depend on £ and k as follows 



f fc (E) = max{l - p m i n {k),p max {k) - 1} 
Cfe(S) = max <^ 0, 



(17) 

Pmax(^) — Pmin(^) J 

Since p max (k) G [1,2], we haveQ £ k ,(k G [0, 1]. Let $ k = (1 + Cfc)Pmax(&) - 1. Given 5 k G 1), 
for convenience, we map 5 k to a "modified RIC" via a piecewise linear mapping as follows 

I 1 - (1 - $k)/Pmm(k), 

/3 fc (4,S) = < (18) 

[ (1 + S^/Prmxik) - 1, 4 G (0 fc ,l). 

Let ^ = 2/p max (k) — 1. The inverse of /3fc(^fc, £) is denoted as 

f i-(i-/3 fc )p min (fc), ^e(o,a) 

&(&,£) ^ (19) 
[ (1 + f3 k )p m&x (k) - 1, G (Cfc, ft). 

In the sequel, we assume that the quantity £&(£) is a small positive number and it measures the 
inhomogeneity of the eigenvalues of £^£7- for \T\ < k. This implies ( k is small, and the deviation 
between (3 k and 5 k is also small. The validity of this assumption will be shown both analytically 
and numerically in Section HV-Cl 

Recall that the sensing matrix A = Z£ in (fT2l) . where all elements of the m x n matrix Z are 
i.i.d. mixed Gaussian random variables, £ is defined in (TTTI) . and n is the number of sensors. We 
now state our main theoretical result. 



4 The arguments of some quantities are sometimes omitted for notational convenience. 
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Theorem 1. Let c\, c 2 > be some universal constants. Given a sparsity level k < \_n/2\, a transmit 
probability p G (0, 1] and a number <5 fc G 1), if the number of measurements satisfies 

m > Cifcp max (A:) 1q 5en 
P 2 /3lpmm(k) k 

where (5k = /3fc(^fc,S) is defined in (fl"8l) . then for any vector x wzY/z support of cardinality of at 
most k, we have that the RIP in (fT4l) /zo/ds wzY/i probability at least 

l-exp(-c 2 mp 2 /3 2 k /A). (21) 

Proo/- See Section IVII-A1 ■ 

Remark 1 (Specialization to the homogeneous case). Clearly, the lower bound on the required number 
of measurements is 0( ff max ^l log f )■ For the homogeneous signal power pattern (i.e., the matrix 
r is a multiple of the identity matrix I n ), we have p mSLX (k) = p min (k) = 1 and (5 k = 5 k . Thus 
the lower bound reduces to 0(4- log f), which coincides with the known results for i.i.d. random 

k 

sensing matrices. See Theorem 5.2 in lfT2l and Section 1.4.4 in [fT3l . 

Remark 2 (Contribution to the RIP analysis). Due to the inhomogeneous signal power pattern, the 
rows aj of the sub-Gaussian sensing matrix A are non-isotropic. To the best of our knowledge, little 
is known about the RIP of non-isotropic sub-Gaussian random matrices. The only relevant result is in 
Remark 5.40 in [13] which gives a concentration inequality of non-isotropic random sensing matrices 
in terms of the upper bound on the spectral norm. However, the authors did not demonstrate how 
the inhomogeneity affects the RIP, nor did they investigate the number of measurements required 
to satisfy the RIP. Theorem \T\ fills this gap. 

Remark 3. Theorem Q] is proved in Section IVII-AI by leveraging Theorem 2.1 of [fT4l . which states 
that a sufficient condition for the approximate preservation of the Euclidean norm upon random 
linear mapping is that the number of measurements is proportional to the fourth power of the sub- 
Gaussian norm. In our scenario, as shown in Lemma |6j the sub-Gaussian norm bounded above by 
1 / y/p. In addition, Lemma[6] shows (using the Chernoff-bound) that the sub-Gaussian tail probability 
is bounded above by pe~ pt ' 2 l 2 . Note that the sub-Gaussian norm is the smallest constant g > for 
which the sub-Gaussian tail probability is 2e~* 2 ^ 2e ^ (Definition [6]). In view of the fact that the 
pre-factor in our bound is p (and not 2), there is some degradation with respect to p in Theorem 1. 
For larger p, the degradation is reduced. 
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B. Achievable System Delay 

The performance of wireless compressive sensing scheme is characterized by two quantities, i.e., 
the MSE and the system delay. The MSE performance under bounded noise is studied in the CS 
literature 0, lfT5l . lfT6l . Note that there is often a trade-off between the two quantities. Under an 
allowable MSE e > 0, we thus analyze the achievable system delay D(e), which is defined as 



D(e) = min m subject to E||x — x||| < e. (22) 

m 



4(£,e) = 



Corollary 1. Let p } m, n, k, S, ~§ k be as in Theorem^ Let e th = 1/(0.0942 x SNR ave ). Given 
an allowable MSE e > e t h, with overwhelming probability (exceeding (|2T)) ). the achievable system 
delay is 

n , \ Ci/cp max (A;) 5en 

Die) = = log ——, (23) 

where 

{ 0.693 + 1/ VeSNR ave 

1.307 -l/VeSNR avc 
t^t 1, d fc e(0fc,l). 

Proof: We start the proof by leveraging on the following lemma. 

Lemma 2 (Theorem 3.2 of lfi~5l ). Let y = Ax + e, where x is a A;- sparse vector in C n , e E C m is a 
zero mean, white random vector whose entries have variance a 2 . If the A satisfies the RIP with RIC 
5k < 0.307, then the solution x to the i\ -minimization problem in CS decoder J3]|, lfi~3il satisfies 

(25) 

Recall the definition of SNR ave in (fT3l . From Lemma |2l to achieve a MSE e, it suffices to ensure 



the RIC satisfies 5* k = 0.307 — l/VeSNR ave . From Theorem [U the required minimum number of 
measurements such that the RIP holds with overwhelming probability is 

= c 1 WAQlogjg 

p 2 («) 2 p„„,W 

where ^ is given in (1241) . The definition of the achievable system delay establishes Corollary [TJ ■ 

Remark 4. Note that Corollary \T\ applies only to the case where the MSE e is greater than the 
threshold e t h- If e < e t h, then from (|23T) . simple algebra reveals that 5k = 0, which implies that 
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the sensing matrix A is a perfect isometry. Since A is random, and the entries are governed by a 
density that is absolutely continuous w.r.t. the Lebesgue measure, this occurs with probability zero, 
implying that the constraint in (1221) is almost surely not satisfied. Thus, in this case, we define the 
system delay to be oo. 

Remark 5. As either e or SNR ave increases, (3 k increases, and thus the system delay D(e) decreases. 
More importantly, we note from Corollary Q] that the key measure for the inhomogeneity of SNRs is 
the ratio r(k) = p m ax(^)/Pmin(^) £ [1, oo). The system delay increases as r(k) increases from one. 
We hence analyze the impact of inhomogeneity of SNRs on the deviation of p mSbX (k) and p m m(k) 
from unity in Section IIV-CI In addition, the system delay decreases as p increases, since SNR ave 
defined in (fl"3l) increases as p increases. Thus, there is an inherent tradeoff between system delay and 
energy consumption because large p implies high transmit energy. Thus, it is always advantageous 
to transmit with as high a probability as possible subject to the causal energy constraint. 

Example 1. Let the number of sensors n = 500, the sparsity level k = 5 and the transmit probability 
p = 0.8. These parameters imply p max (A;) = 1.09, p m m(k) = 0.88 (See Section HVl). We plot the 
achievable system delay D(e) against the allowable MSE e, for different average SNRs in Fig. [2] 
We observe that beyond the MSE threshold (that depends on the average SNR), the system delay 
D(e) decreases as either e or SNR ave increases, which is is expected. 

Remark 6. We considered the scenario in which the FC collects one data vector from all sensors 
in one frame. As a generalization of our setup, one can seek to minimize the total number of slots 
for collecting multiple data vectors. By adjusting the transmit probability in each frame, one can 
allocate different powers for different frames, such that both the recovery accuracy and the causal 
energy constraint is guaranteed. Details of this possible extension are beyond the scope of this paper. 

C. Effect of Inhomogeneity 

This section investigates the impact of inhomogeneity of (receive) SNRs on the number of 
measurements needed to satisfy the RIR Without loss of generality, we assume all sensors have 
the same noise power, hence, it suffices to analyze the impact of inhomogeneity of receive signal 
powers. We focus on the asymptotic scenario where the number of sensors n tends to infinity 
and, for the ease of analysis, k is kept constant. To make the dependence on n clear, we denote 
Pmax(^) (resp. p min (k)) as p max (£:,n) (resp. p min (k,n)). It will be shown that both p max (£:,n) and 
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■a— Ave. SNR=35dB 




MSE 



Fig. 2. Plot of achievable system delay against allowable MSE. Beyond the MSE threshold, the achievable system delay increases 
as either the allowable MSE or the average SNR increases. 



Pmin(k,n) concentrate around one when n is large, and the rate of convergence to one depends 
on the inhomogeneity of SNRs. This implies that the recovery performance (the required number 
of measurements and the probability that the RIP holds in Theorem [D is not sensitive to the 
inhomogeneity of SNRs when n is large. 

Let w = Ev, where the unit-norm, A>sparse vector v is supported on the set T = {si, . . . , s^} C 
{1, . . . , n} and let s\ < . . . < s&. To obtain further insights, we let ^ be the n-point discrete Fourier 
transform (DFT) matrix. Then the squared £ 2 -norm of w can be expressed as follows 

HII^2>(l+£ t -"> )})■ (27, 

ave i=l \ 9=1 l=l,l<q k V 7 J J 

Since ||w||| is strongly influenced by the inner summation terms, we analyze the behavior of these 
terms more carefully in the sequel. When the signal power pattern is homogeneous, i.e., T = 
diag( 1 /7, . . . , yfy), we have ||w||| = ||Sv||| = 1, hence p max (k,n) = p min (/c,n) = 1 for all k,n. 

We are interested to know how p ma , x (k,n) and p mm (k,n) vary with different signal powers 7j's. 
Thus, we consider a model in which the j^s are i.i.d. random variables following an approximate 
Gaussian distribution. By varying the variance of this distribution, we are in fact varying the 
inhomogeneity of the signal powers. Specifically, to deal with the fact that the signal powers cannot 
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be negative, we use the following truncated Gaussian distribution to model the signal powers. 

Definition 2. A random variable X is truncated Gaussian, denoted as Af tT (fi,u 2 ), if its pdf is 

1 ( (x-/i) 2 

exp 

27TW(1 - Q(ji/u)) V 2w 



g x {x; fJi, u) = j= — ; NN exp ( — — ) , (28) 



for x > and else, where Q(x) = ^7= e * 2//2 dt is the Q-function of a standard Gaussian pdf. 

We assume that 7; ~ At r (/-t, w 2 ) for all % = 1, . . . , n and they are mutually independent. Given p,, 
the "variance" to 2 is a measure of the degree of inhomogeneity of the signal powers 7/s. Also, the 
parameter d = p/u is a measure of the homogeneity of the SNRs. If d is small (resp. large), the SNRs 
are less (resp. more) homogeneous. We use the exponential asymptotic notation a n < exp(-nE) to 
mean that limsup^^ ^loga n < —E. Under the above assumptions on the statistics of the signal 
powers, we have the following large deviations upper bound on p meiX (k,n) and p m - m (k,n): 

Theorem 2. Let d = fi/co. For any t > 0, and any constant 1 < k < [n/2\, 

F(p max (k,n) > l + t) < exp \-nd 2 E(k,t) 2 ] , 

(29) 

P(PminO,™) < 1 -t) < exp [-nd 2 E(k,t) 2 ] , 
where the exponent E(k,t) is defined as E(k,t) = t/(k — 1 + y/2t). 

Proof: See Section IVII-B1 ■ 
Recall that Theorem Q] says that both the required number of measurements and the probability 
that the RIP holds depends on the ratio r(k,n) = p m ax(k,n)/p Ta i n (k,n). From Theorem [2l we note 
that both p max (k,n) and p m i n (k,n) concentrate around one in the large n regime (for bounded k), 
and the rate of convergence to one depends on the inhomogeneity of SNRs. This allows us to 
conclude that that for large-scale EHWSNs (relative to the signal sparsity), the inhomogeneity of 
SNRs does not significantly affect the RIP and the system delay, which is a surprisingly positive 
observation. 

Remark 7. We note that E(k,t) is an increasing function of t and a decreasing function of the 
sparsity k which is expected. Also, the exponent d 2 E(k,t) 2 increases with d, which means that the 
convergence of p ma , x (k, n) and p mm (k, n) to unity is faster when d is large, or equivalently, when 
the signal powers are more homogeneous. It is observed that p mSLX (k, n) is close to one in the large 
n regime. This validates the assumption that p ma , x (k,n) 6 [1,2] in Section ITV-Al 
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Remark 8. In the preceding analysis, and particularly in Theorem [21 we assumed that k does not 
grow with n. Close examination of the proof shows that if k = [n 1 / 2 ~ x \ for any A £ (0, 1/2], then 
the probability that {p max (k, n) > 1 + t} still goes to zero albeit at a slower rate of ~ exp(— n 2X d 2 t 2 ) 
(not exponential in n). More precisely, we can verify that 

lim sup 4x log P(Pmax(A:, u) > 1 + 1) < -d 2 t 2 , (30) 

n— >-oo W 

and analogously for {p min (A;,n) < 1 — t}. Inequality (f30b is a so-called moderate-deviations result 
ifTTl Sec. 3.7]. Notice that the dependencies on the homogeneity d = p/co and t are similar to (|29b . 

Remark 9. One may wonder whether Theorem |2] depends strongly on \1> being the DFT matrix. In 
fact, the only property of the DFT that we exploit in the proof of Theorem [2] is its circular symmetry, 
i.e., each basis vector of the DFT (containing elements that are powers of the n-th root of unity) is 
uniformly distributed over the circle in the complex plane. Hence, certain Cesaro-sums converge to 
zero and the proof goes through. See (1441) in Section IViI-BI Thus, Theorem [2] also applies for other 
sparsity-inducing bases whose basis vectors have the circular symmetric property, e.g., the discrete 
cosine transform (DCT) or the Hadamard transform. 

V. Simulation Results 

We now numerically validate our results. We set the number of sensors n = 500 and transmit 
probability p = 0.8. We use the truncated Gaussian distribution with ji = 0.2 to model the receive 
signal powers, and use the basis pursuit de-noising (BPDN) algorithm lfT8ll as the CS decoder. 

First, we fix d = 2, which implies u = n/d = 0.1. Fig. [3] plots the MSE against the number 
of measurements (or transmissions) m for different sparsities k and different average SNRs. As 
expected, the MSE decreases as either k decreases or the average SNR increases. Consider the 
MSE level 2 x 10~ 3 . When the average SNR is 25 dB, the wireless compressive sensing scheme 
achieves a smaller system delay of D = 68 for k = 5 compared to D = 115 for k = 10. When 
the sparsity k = 5, the scheme achieves a smaller system delay of D = 39 for SNR ave = 30dB 
compared to D = 68 for SNR ave = 25dB. 

Second, we fix d = 2 and the average SNR to be 25dB. Fig. |4] compares the MSEs of the 
inhomogeneous SNR and the homogeneous SNR scenarios, for the sparsity levels k = 5, 10, 20. 
It is observed that in the inhomogeneous scenario, the MSE performance is slightly worse than 
that of the homogeneous-SNR scenario. Note that the degradation becomes larger as the sparsity k 
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— e— SNR=20dB, k=5 
— *— SNR=20dB, k=10 ; 
— b— SNR=25dB, k=5 : 
— 0— SNR=25dB, k=10 

SNR=30dB, k=5 
— A— SNR=30dB, k=l() - 
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Number of measurements: m 



Fig. 3. Plot of MSE against the number of measurements. The MSE decreases as k decreases, or the average SNR increases. 




Number of measurements: m 



Fig. 4. Plot of MSE against the number of measurements. The MSE performance for the inhomogeneous scenario is slightly worse 
than that of the homogeneous-SNR scenario. 



increases. This is because the convergence rate for p ma , x (k) and p m i a (k) to one is faster if k is small 
relative to n. This corroborates the observation in Section HV-Cl 

Third, we set d = 1, 2 and k = 5, 10. Fig. [5] shows the cumulative distribution function (CDF) of 
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Fig. 5. CDF of p max (fc,500) and p m i n (fc,500). Both converge to one faster for more homogeneous SNRs (i.e., large d). 



Pmax(&, 500) and p m m(k, 500). We note that both p max (k, 500) and Pmm(k, 500) converge to one faster 
for larger d, or equivalently, for more homogeneous SNRs. Also, under the same inhomogeneous 
SNRs, both p max (^,500) and p m in(&,500) converge to one faster for smaller k. 

Finally, we numerically validate the asymptotic behavior of p max (&,^) as n grows. Set k — 5, 
d = 1,2,3, respectively. Fig. [6] shows the probability that p ma ,x(k,n) > 1.04 for different n. It is 
observed that the logarithm of the probability decreases linearly as n grows (when n/k is large) 
and furthermore, the slope varies quadratically w.r.t. d, i.e., the slope is proportional to —1, —4, —9 
for d = 1, 2, 3, respectively. This observation corroborates Theorem [2l 

VI. Conclusion 

In this paper, we considered the scenario in which each sensor independently decides whether or 
not to transmit with some probability p, and the overall transmission power (and thus p) depends 
on its available energy. Hence, only a subset of sensors transmits concurrently to the FC, and this 
exploits the spatial combination inherent in wireless channels. We use techniques from CS theory to 
prove a lower bound on the required number of measurements to satisfy the RIP and hence to ensure 
that the data recovery is both computationally efficient (and amenable to convex optimization) and 
accurate. We also compute an achievable system delay given an allowable MSE. Finally, we analyze 
the impact of inhomogeneity on the A; -restricted extreme eigenvalues. These eigenvalues govern the 
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Fig. 6. Plot of the probability of p ma x(fc, fi) > 1.04 against the number of sensors. The logarithm of the probability decreases 
linearly as n grows, and the slope varies quadratically w.r.t. d. 



number of measurements required for the RIP to hold. In large-scale EH-WSNs, we showed using 
large deviation techniques that the recovery accuracy and the system delay are not sensitive to the 
inhomogeneity of SNRs. 

VII. Proofs of Main Results 

A. Proof of Theorem [7] 

Proof: Recall the signal model in (fT2|) . i.e., y = ZSx + e = Ax + e. The proof involves three 
steps. In step 1 and step 2, we prove the desired result when all quantities are real; and in step 3, we 
extend the result to the complex case. For the real case, we show that the matrix Z acts as isometry 
on the images of the sparse vector under matrix S, i.e., on the set {Sv : ||v|| < k, v G IR n }. 
By showing the rows of Z are isotropic sub-Gaussian and by exploiting the so-called "restricted 
eigenvalue property" of S, we derive an RIP for the matrix A in step 2. Before step 1, we start 
with the following preliminaries. Let d(u, v) be the Euclidean distance in M. n . 

Definition 3 (Nets, covering numbers [13]). Consider a metric space (U,d) with U C W 1 and a 
positive number e. A subset M e C U is called an e-net ofU if every point uGW can be approximated 
to within e by some point v G Af t , i.e., d(u, v) < e. The covering number J\f(U, e) is the cardinality 
of the smallest e-net oflA. 
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Definition 4 (Set of sparse vectors). Let S n ~ l be the unit sphere in W 1 and 1 < k < n. Define 

U k ^{xxeS n - 1 : ||u||o<A;}, 
also define the subset of the Euclidean unit ball 8% with (at most) k-sparse vectors as 

U k = {ue B r f l : ||u|| < k}. 

Lemma 3 (Upper bound on covering numbers, Lemma 2.3 in [14J). Let < e < 1 and 1 < k < n. 
There exists an e-net of W&, namely J\f e , whose cardinality can be upper bounded as 

Definition 5 (Complexity measure [14]). The complexity of a set VcR" is defined as 



.(V)^E 



sup |(v, u)| 



.vev 

where (•, •) denotes inner product in M n , u ~ A/"(0, 1) is a standard Gaussian random vector, and 
the supremum is over all vectors vG V. 

Given a subset V C R n , we aim to measure the complexity of W(V), which is the image set of 
the set V under a fixed linear mapping X. More precisely, we define 

W(V) = {w G R n : w = Sv, for some v e V}. (31) 

Define the complexity of W(V) as 4 (W(V)) = E [sup veV |(v, Su)|] . 

Lemma 4 (Upper bound on complexity measure, Lemma B.6 in |fT9l ). Let AA k be a |-net of £4 
provided by Lemma [3] Then for all 1 < k < n, it holds that 



5 en 



L (W(W fc )) < (>V(7Vi )fc )) , (32) 



where p meiX (k) is the /^-restricted maximum eigenvalue of S*S defined in (|T5l) . 
Define the set 

4 - {v € t" : ||Sv|| 2 = 1, ||v|| = k], (33) 
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then for V = the complexity measure of the set W(£fc) is bounded in the following Lemma. 
Lemma 5. The complexity measure of the set W(£k) is upper bounded as 



L (W(^))<6WA;^4ilog 5e/; 



(34) 



Pmin(^) k 

where p ma , x (k) and p min (fc) are defined in ( fl"5l) . 

Proof: For any vector v € and any random vector u e R", we have with probability one 



that 



|(u,Sv)| = |(v,Su)| = ||v|| 2 



V 2 



-,5]u 



< ||v|| 2 sup|(r,Su)| 

reW fe 



(35) 



where the inequality follows from the definition of the set {tt^t^ : v e 4} C Wt. From Lemma HI 

r i w r 

E 



sup |(u, Sv)| 
Lve£ fc 



(a) 

< sup || V || 2 IE 

ves k 



sup | (r, Eu) 

_reZ4 



® r , Pmax(fc) , 5en 

< 6< / fc 7TT- lOg 



(36) 



Pmin(^) 

where (a) comes from (1331 ) and (6) follows from Lemma [4] and the definitions in (fT5l) . ■ 
Step 1: Isometry on the images of sparse vectors. We consider the case in which the sensor 
data and all matrices are real. In this step, we first show that all row vectors in matrix Z are isotropic 
sub-Gaussian (see Definition [7] below) in Lemma [6] Then we use Lemma [5] to obtain an isometry 
on the images of sparse vectors. 

Definition 6 (sub-Gaussian random variables |fT3l ). Let X be a zero mean random variable that 
has unit variance. It is sub-Gaussian if for any t > 0, there exist a positive number g such that 

>t)< 2exp 



2fT 



The sub-Gaussian norm ||X||^ 2 is the smallest number g for which the above inequality holds. 

Definition 7 (Isotropic sub-Gaussian random vectors [13]). Let u be a random vector in W 1 . If 
E[uu T ] = l n , then u is called isotropic. The random vector u z'^ sub-Gaussian with constant a if 



sup 

reR":||r|| 2 =l 



(u,r)||^ 2 < a. 
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Lemma 6. Let u 6 1" be a random vector with i.i.d. elements, each distributed as A/"(0, l/p,p). 
Then u is isotropic sub-Gaussian with constant a = c /y/p, where cq is an absolute constant. 

Proof: Since all elements in u are independent zero mean random variables, and has unit 
variance, we have E[uu T ] = I n . Let X ~ A/"(0, l/p,p) be a mixed Gaussian random variable with 
pdf defined in (Q. Then, we have for every t > that 



f°° I x 2 
F(\X\>t)=2 P--T=- exp(-— )dx 



VP* 

< pe~ pt 12 < 2e- pt /2 , 

where (a) follows from the Chernoff bound on Gaussian Q-function, and (6) from p £ (0, 1]. Hence, 
the sub-Gaussian norm of X is bounded above by 1/y/p. From Lemma 5.24 in lfi~3l . we have that 
the vector u is sub-Gaussian with constant a = cq/^/p, where cq is an absolute constant. ■ 
Recall that the signal model is y = ZSx + e. We note that all elements in matrix Z are i.i.d. with 
distribution A/"(0, 1 / (mp),p). Then Lemma [6] implies that all row vectors of scaled matrix y/mZ are 
independent, and isotropic sub-Gaussian with constant a = co/s/p. The key idea to prove Theorem 
Q] is to apply one result in lfi~4l . which is given without proof as follows. 

Lemma 7 (Theorem 2.1 in 03]|). Set 1 < m < n and < /3 < 1. Let b be an isotropic sub-Gaussian 
random vector on W l with constant a > 1. Let b 1; b 2 , . . . , h n be independent copies of b. Let the 
random matrix B have rows bi, b 2 , . . . , b n . Let V C 5 n_1 . If m satisfies 

C ' aA n ^,\2 

m > ^L(V) 2 , 

then with probability at least 1 — exp (— c/3 2 m/a 4 ), for all v E V, we have 

1 _p<\)Pv$l< 1 + i 3 
m 

where d , c are positive absolute constants. 

Recall the definitions in (I3TI) and (|33l) . and set V = W(£k)- Then from Lemma [51 Lemma [6] and 
Lemma Ul we obtain the following result: if the number of measurements 

m > cifcp max (fc) ^ 5eri ^ 37 ^ 
P 2 (3 2 Pmin(k) k 
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then with probability at least 1 — exp (— c 2 /3 2 p 2 m/4), for all v G S k , we have 

1-/3 < ||Z£v|| 2 < 1 + (38) 

where c\ = 36c' Cq and c 2 = c/cq are positive absolute constants. 

Furthermore, by replacing v with the S-normalized vector v/||Sv||2 in d38l) . we obtain 

(1 - /3)||Ev|| 2 < ||Z£v|| 2 < (1 + /3)||Sv|| 2 (39) 

holds with probability at least 1 — exp (— c 2 /3 2 p 2 m/4). 

Step 2: Restricted Isometry Property. From (l39l and the definitions of the ^-restricted extreme 
eigenvalues in (fl~5l) . for any A;-sparse vector x, we obtain that the following inequality 

(1 - /3)p mi n(£0||x|| 2 < ||ZSx|| 2 < (1 + /3)p max (£;)||x|| 2 , (40) 

holds with probability at least 1 — exp (— c 2 mp 2 /9 2 /4). 

Recall the definitions of the parameters £&, (k, $k, fik, and Sk defined prior to Theorem [Q As in 
(|40|) . the LHS and the RHS may have different deviations from one. Hence, the maximum operation 
and piecewise linear mappings are used in those definitions, such that after some simple substitutions 
and algebraic manipulations, the following inequality 

(l-4)||x|| 2 <||ZSx|| 2 <(l + 4)||x|| 2 (41) 

holds with probability at least 1 — exp (— c 2 mp 2 /3|/4). Collecting the results in (|37l) and (14TT) . we 
obtain Theorem Q] for the real case. 

Step 3: Generalization to the complex case. We generalize the above RIP result to the complex 
case. First, we show that the matrix ZS satisfies the RIP for the complex data x = x R + jx 1 . With 
probability at least 1 — exp (— c 2 mp 2 /3|/4), we have 

(l-4)||x R || 2 <||ZSx R || 2 <(l + 4)||x R || 2 , 

(l-^ii^ii^iizsx^^a + ^Hx 1 !! 2 . 

Combining the above two equations yields 

(l-4)||x|| 2 <||ZSx|| 2 <(l + 4)||x|| 2 . 
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Second, we show that when the sensing matrix A in our scheme is complex random matrix, it 
still satisfies the RIP. Let A = A R + j A 1 . It is assumed that the real part A R and the imaginary 
part A 1 are independent, and have the same probability distribution. Recall that the sensing matrix 
A = ZS. For any A;- sparse complex vector x, we have 

i(l-4)||x||^<||A R x||^<i(l + 4)||xi, 
i(l - 5 k ) Ml < || A*x\\l< l(l + 5 k )\\x\\l 

Combining the above two equations yields the RIP in (fT4l) for the general complex case. ■ 



B. Proof of Theorem \2\ 

Proof: Clearly, we have p ma , x (l,n) = p mm (l,n) = 1 so the bounds are satisfied for k = 1. We 
will first prove Theorem [2] for the case k = 2. Subsequently, we generalize the result to arbitrary 
2 < k < [n/2\ . Let the two non-zero elements be v si = A\e^ x and v S2 = A 2 e^ 2 , where A\+A\ = 1 
(because ||v|| 2 = 1). Then from (|27T ), and the fact that P avc = Y^i=i1i/ n ' we obtain 



1 2A 1 A 2 J^ / 2n(t-l)A\ 

ave • -, ,L± ave ■ -■ \ * L J 



n P, 

1=1 1=1 

= 1 + 2A 1 A 2 2 ^ aili , (42) 

where 9 = 9 1 -9 2 E (0,2n], A = s 2 -s 1 E {1, . . . ,n-l},and a { = cos(9+2n(i - l)A/n). We now 
set Xi = 7j to emphasize that the signal powers are random variables. Recall that the distributions 
of Xj's are truncated Gaussian, denoted by J\f tT (ii,oj). We consider the random variable 

We define the Cesaro-sum of the a^'s as 

s lA 1 ^ / 2tx(i- 1)A\ 

^--E ffl » = -E cos r + - — — > (44 ) 

i=l i=l v 7 

and note that as n — > oo, the Cesaro-sum converge. Indeed, we have 



a n ^a = — — / cos (9 + At) dt = 0. (45) 
27rA 
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We now bound the probability that S n exceeds some t > by considering the chain of inequalities 

J2i=l a i X i 



(S n > t) = P 



(b) 

< P 



(c) 

< P 



< P 



> t 



n n 

Y aiXi >t^2Xi 



i=l 



i=l 



n n 1 f 1 n 

i=l i=l J L j=l 

i ^ \ 

- > a^X; > tr/i + ] 
i=i J 



> Tjl 

- ) ' X % > T/l 

8=1 

1 n 



(46) 



where (a) is due to the fact that X/s are nonnegative random variables, (b) follows from the fact 
F(A) = P(A C\B)+ ¥{A n B c ) < F{A nB) + ¥{B C ) and (c) comes from monotonicity of measure. 
In the following, we bound the two terms in (l46l) using the theory of large deviations ifTTI . 

Define t' = tr/i and let s be an arbitrary non-negative number. Then from Markov's inequality, 
the first term in (l46l) can be upper bounded as follows 



P 



I — aiXi > i! I < exp(— nst')E 



cxp 



,i=i 



(47) 



which implies by the independence of the XV s that 



1 / 1 n \ 1 " 

-logP - >t'\ < -st' + - > logEfexpfsaiXi)]. 

n \ n / n ^— ' 

\ i=l / i=l 



(48) 



To bound the sum in (|48l) . we find the cumulant-generating function (CGF) of X ~ J\f tr (fi,co 2 ) in 
terms of a Gaussian with mean [i and variance w 2 . By simple algebraic manipulations, we have 

1 



logE[exp(sX)] = \is + i^s 2 ~^ ^0^' u ' s )' 



(49) 



where w, s) = log (1 — Q (fi/oJ + cos)) — log (1 — Q (fi/u)). We note that given that (/x, to) is 
a positive pair of numbers, s !->■ u;, s) for s > is concave, because s —Q(n/to + us) (for 
/i/cu > 0) and t !->■ log(l + 1) are both concave and the latter function is non-decreasing. Moreover, 
s i — y (p(fi,co,s) is continuous for each positive (fx, to) pair, because every concave function on an 
open set is continuous. Note that to, 0) = 0. 
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Substituting the CGF of the truncated Gaussian distribution in (|49l ) into (|48l) yields 

-logF[-J2a i X i >A 
n \n J 

uj 2 s 2 n 1 n 

< -st' + fxsa n + — — 2J °i + ~~ w ' aiS ) 



2n ^— ' * n 

i=l i=l 



,2„2 , ,2 2 



(a) . W'S* U)"S" v-^ / . 47rA(z — l)\ lr^ . . 

-st + usa n H h — — > cos 26 H + - > tp{/i, u, ajSj 

i=i v 7 i=i 



(b) , co 2 s 2 wVA / 4ttA(i-1)\ / \ 

< -st' + /isa n + — p + -j— cos ( 20 + K - '- 1 + (p I //, uj, - 2^ Oi J , (50) 

where (a) comes from the definition of and the double-angle formula for the cosine, and (b) 
follows the fact (p(fi,u, s) is concave in s for any positive (ji,u) pair. 

Taking the limsup on both sides of (l50l) and using the definition of a n yields 



lim sup - log P ( - ciiXi >t'\ 

n->oc U \ n ~i J 

( Q ) cu 2 s 2 cu 2 s 2 f 47rA 

< —st'-\ 1 — / cos (26 + t)dt + lim sup ip (fi, 00 , a n s) 

4 167TA J n^oo 

(b) . u 2 s 2 . _ . 

= — st H h limsup (p {[i, u, a n s) 

4 n— >oo 

= + = f{s), (51) 

where (a) follows from Riemann sums, (b) comes from the fact cosine has zero mean over an integer 
number of periods (note A G Z) and (c) follows from the continuity of ip(fi,u, s) and (145T ). Note 
that the minimum f(s) in (|5B is f(s*) = -r 2 d 2 t 2 (attained at s* = 2t'/u 2 ). Hence, 



P 



f i a i X * > A ^ ex P [-nr 2 d 2 t 2 ] . (52) 



The second term in (|46l) can be bounded using standard techniques from the large deviations 
theory [fTTl (Cramer's theorem) and along the same lines as the derivation above. As such we have 



/lA \ («) r , 

I — ) < Tfi J < exp \ —n (s/i(l — 



P ( - ^2 X % < Tfi ] < exp [-71 (s/i(l - r) - wV/2 - ^(/i, w , -s))] 

(6) 

< exp [-71 (s/i(l - r) - w 2 s 2 /2)] 
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where (a) follows from using the CGF of Xj in (|49l) , and (b) follows from the fact that (p(fi, u, —s) < 
for all s > 0. Hence, setting s = //(l — t)/lu 2 , we have 



< exp [-n(l - r) 2 c/ 2 /2] • (53) 



Combining the two terms in (l46l) . we have from (|52l) and (l53l) and the largest-exponent-dominates 
principle that 

P(S n > t) < exp [-n man {r 2 dV, (1 - r)V/2}] (54) 



Since r > is a free parameter, we can set it to be r* = 1+y ^ t - Substituting r* into (1541) yields 

F (S n > t) < -nd 2 P . (55) 
where t = t/(l + y/2t). By symmetry, we can also conclude that 

P (S n < -t) < -nd 2 t 2 . (56) 
Recall that p ma , x (k,n) is the maximum value of ||w||| = ||Sv||| over all unit-norm A;-sparse 



vectors v. From (|42]>. || w||| depends only on A\A 2 . Note that < A 1 A 2 < 1/2 because VA\A 2 < 
(Ai + A 2 )/2. We set AiA 2 = 1/2, whence ||w|| 2 attains its maximum value. From ((42]) . 

F (p max (2, n) > 1 + t) < exp [~nd 2 t 2 ] , 

(57) 

P (p min (2, n)<l-t)< exp [-nd 2 t 2 ] . 

Having proved the result for the k = 2 case, we now generalize it to the case where k > 2. 
Set the non-zero elements of the vector v to be v Sq = A q e^ 0q , q = 1, . . . , k, where Ylq=i A 2 , = 1- 
Equation (|27|) can be written as 



1=1 \ q=l l=l,lj±q x 

= i + E E ^,Ecosk,+^^), 

g=l l=l,l^q i=l ^ ' 

k k 

= 1 + E E A q AtS% l ±l + B n , (58) 

g=l l=l,l^q 

where is defined as in (1431) but involving the g-th and the /-th nonzero elements of v, i.e., 
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@q,i = Oq — Ou an d A q ,i = si — s q . On the other hand, we can bound as follows 



, , / k k \ / fe fe \ 

\g=l Z=l,Z^g / \q=l l=l,#q J 

' k \ 2 k \ / fc fe 

E- 4 ? -E4 E E W> 

v<?=l / g=l / \<?=1 l=l,l^q 

\9=1 / / \g=l l=l,l?q / 



(6) 
< 



fc-1 '' 



A; 

g=l 1=1,1 J=q 



where (a) comes from the Cauchy-Schwartz inequality and (b) comes from the basic inequality 
relating the arithmetic and quadratic means, namely l/MjE =1 atj < (1/M 5_w=i a ']) 1 ^ 2 - 
Now, given any t > 0, we can bound the probability that \B n \ exceeds t as follows: 



F(\B n \>t) = F(B 2 l >t 2 ) 

< p ( E E > £ 



,9=1 i=W<? 



< P ( max(S^) 2 > 

|7„ 



2 



^" rB ' ' {k — l) 2 

A: fc 



^E;(w ! >^) 

(7=1 l=\l±q V \ I / 



q=l l=l,l^q 
k k 



= E E p (^'> A). c°) 

g=l l=l,l^q ^ ' 

where (a) comes from (|59l) and monotonicity of measure and (b) comes from the union bound. 
Applying the result for k = 2 in (1571) to (l60l) . we have 

P (|S n | > t) < k(k - 1) exp [-nd 2 E(k, t) 2 ] , (61) 

where the exponent is E(k,t) =t/(k — l + \/2i). Recall the definition of p ma , x (k,n) in (fl"5l) . From 
(1581) and <|6D, we conclude that 

P (/wO, n) > 1 + t) < exp [-nd 2 E{k, t) 2 ] . (62) 
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The analysis of P (p m i n (k, n) < 1 — t) proceeds mutatis mutandis. This completes the proof. ■ 
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